Skip to main content

5.4.4 Prediction and residual values

All regression variants found in microdata.no have associated commands that generate, among other things, residual and prediction values. These are values ​​that can be used to analyze the data spread and for testing regression models. Prediction values ​​can also be used as input for further analyses.

The commands have the same name as the associated regression command plus -predict.

Syntax:

regress-predict <variable> <variabel list> [if <condition>][, <options>]

The variables are specified in the same way as for the corresponding regression model run with the regress command.

The following values ​​can be retrieved: Prediction values, residuals and "Cook's distance"

You decide which values ​​you want to generate through the use of options. The result of the runs is a set of variables that contain the different values. By default, the former value type is generated, but it is still recommended to specify value type through options as this makes you able to create names for the generated variables inside parentheses as shown in the syntax example below. If you run several predict commands, you have to create new names for the automatically generated variables.

Syntax example:

regress-predict wage age man wealth, residuals(res) predicted(pred) cooksd(cook)

The automatically generated variables can be used as input for further analyses or to be displayed graphically. Current graphical commands are hexbin and histogram. By running a histogram on the residual variable, one can check whether the residuals are normally distributed. The hexbin command can also be used to create anonymized scatter plots where one combines two sets of values.

For more details, it is recommended to use the help regress-predict command.

\rhd Example: Prediction and residual values analysis